17 research outputs found

    Exploiting different levels of parallelism in the biological sequence comparison problem

    Get PDF
    In the last years the fast growth of bioinformatics field has atracted the attention of computer scientists. At the same time, de exponential growth of databases that contains biological information (such as protein and DNA data) demands great efforts to improve the performance of computational platforms. In this work, we investigate how bioinformatics applications benefit from parallel architectures that combine different alternatives to exploit coarse- and fine-grain parallelism. As a case of analysis, we study the performance behavior of the Ssearch application that implements the Smith-Waterman algorithm (SW), which is a dynamic programing approach that explores the similarity between a pair of sequences. The inherent large parallelism of the application makes it ideal for architectures supporting multiple dimensions of parallelism (thread-level parallelism, TLP; data-level parallelism, DLP; instruction-level parallelism, ILP). We study how this algorithm can take advantage of different parallel machines like the SGI Altix, IBM Power6, IBM Cell BE and MareNostrum machines. Our study includes a qualitative analysis of the parallelization opportunities and also the quantification of the performance in terms of speedup and execution time. These measures are collected taking into account the specific characteristics of each architecture. As an example, our results show that a share memory multiprocessor architecture (SMP) like the PowerPC 970MP of Marenostrum machine can surpasses a heterogeneous multi- processor machine like the current IBM Cell BE.Peer ReviewedPostprint (published version

    Performance analysis of sequence alignment applications

    Get PDF
    Advances in molecular biology have led to a continued growth in the biological information generated by the scientific community. Additionally, this area has become a multi-disciplinary field, including components of mathematics, biology, chemistry, and computer science, generating several challenges in the scientific community from different points of view. For this reason, bioinformatic applications represent an increasingly important workload. However, even though the importance of this field is clear, common bioinformatic applications and their implications on micro-architectural design have not received enough attention from the computer architecture community. This paper presents a micro-architecture performance analysis of recognized bioinformatic applications for the comparison and alignment of biological sequences, including BLAST, FASTA and some recognized parallel implementations of the Smith-Waterman algorithm that use the Altivec SIMD extension to speed-up the performance. We adopt a simulation-based methodology to perform a detailed workload characterization. We analyze architectural and micro-architectural aspects like pipeline configurations, issue widths, functional unit mixes, memory hierarchy and their implications on the performance behavior. We have found that the memory subsystem is the component with more impact in the performance of the BLAST heuristic, the branch predictor is responsible for the major performance loss for FASTA and SSEARCH34, and long dependency chains are the limiting factor in the SIMD implementations of Smith-Waterman.Peer ReviewedPostprint (published version

    Parallel processing in biological sequence comparison using general purpose processors

    Get PDF
    The comparison and alignment of DNA and protein sequences are important tasks in molecular biology and bioinformatics. One of the most well known algorithms to perform the string-matching operation present in these tasks is the Smith-Waterman algorithm (SW). However, it is a computation intensive algorithm, and many researchers have developed heuristic strategies to avoid using it, specially when using large databases to perform the search. There are several efficient implementations of the SW algorithm on general purpose processors. These implementations try to extract data-level parallelism taking advantage of single-instruction multiple-data extensions (SIMD), capable of performing several operations in parallel on a set of data. In this paper, we propose a more efficient data parallel implementation of the SW algorithm. Our proposed implementation obtains a 30% reduction in the execution time relative to the previous best data-parallel alternative. In this paper we review different alternative implementation of the SW algorithm, compare them with our proposal, and present preliminary results for some heuristic implementations. Finally, we present a detailed study of the computational complexity of the different alignment algorithms presented and their behavior on the different aspect of the CPU microarchitecture.Peer ReviewedPostprint (published version

    The SARC architecture

    Get PDF
    The SARC architecture is composed of multiple processor types and a set of user-managed direct memory access (DMA) engines that let the runtime scheduler overlap data transfer and computation. The runtime system automatically allocates tasks on the heterogeneous cores and schedules the data transfers through the DMA engines. SARC's programming model supports various highly parallel applications, with matching support from specialized accelerator processors.Postprint (published version

    On the scalability of 1- and 2-dimensional SIMD extensions for multimedia applications

    Get PDF
    SIMD extensions are the most common technique used in current processors for multimedia computing. In order to obtain more performance for emerging applications SIMD extensions need to be scaled. In this paper we perform a scalability analysis of SIMD extensions for multimedia applications. Scaling a 1-dimensional extension, like Intel MMX, was compared to scaling a 2-dimensional (matrix) extension. Evaluations have demonstrated that the 2-d architecture is able to use more parallel hardware than the 1-d extension. Speed-ups over a 2-way superscalar processor with MMX-like extension go up to 4X for kernels and up to 3.3X for complete applications and the matrix architecture can deliver, in some cases, more performance with simpler processor configurations. The experiments also show that the scaled matrix architecture is reaching the limits of the DLP available in the internal loops of common multimedia kernels.Peer ReviewedPostprint (published version

    The impact of non-additive genetic associations on age-related complex diseases

    Get PDF
    Genome-wide association studies (GWAS) are not fully comprehensive, as current strategies typically test only the additive model, exclude the X chromosome, and use only one reference panel for genotype imputation. We implement an extensive GWAS strategy, GUIDANCE, which improves genotype imputation by using multiple reference panels and includes the analysis of the X chromosome and non-additive models to test for association. We apply this methodology to 62,281 subjects across 22 age-related diseases and identify 94 genome-wide associated loci, including 26 previously unreported. Moreover, we observe that 27.7% of the 94 loci are missed if we use standard imputation strategies with a single reference panel, such as HRC, and only test the additive model. Among the new findings, we identify three novel low-frequency recessive variants with odds ratios larger than 4, which need at least a three-fold larger sample size to be detected under the additive model. This study highlights the benefits of applying innovative strategies to better uncover the genetic architecture of complex diseases. Most genome-wide association studies assume an additive model, exclude the X chromosome, and use one reference panel. Here, the authors implement a strategy including non-additive models and find that the number of loci for age-related traits increases as compared to the additive model alone.Peer reviewe

    Exploiting different levels of parallelism in the biological sequence comparison problem

    No full text
    In the last years the fast growth of bioinformatics field has atracted the attention of computer scientists. At the same time, de exponential growth of databases that contains biological information (such as protein and DNA data) demands great efforts to improve the performance of computational platforms. In this work, we investigate how bioinformatics applications benefit from parallel architectures that combine different alternatives to exploit coarse- and fine-grain parallelism. As a case of analysis, we study the performance behavior of the Ssearch application that implements the Smith-Waterman algorithm (SW), which is a dynamic programing approach that explores the similarity between a pair of sequences. The inherent large parallelism of the application makes it ideal for architectures supporting multiple dimensions of parallelism (thread-level parallelism, TLP; data-level parallelism, DLP; instruction-level parallelism, ILP). We study how this algorithm can take advantage of different parallel machines like the SGI Altix, IBM Power6, IBM Cell BE and MareNostrum machines. Our study includes a qualitative analysis of the parallelization opportunities and also the quantification of the performance in terms of speedup and execution time. These measures are collected taking into account the specific characteristics of each architecture. As an example, our results show that a share memory multiprocessor architecture (SMP) like the PowerPC 970MP of Marenostrum machine can surpasses a heterogeneous multi- processor machine like the current IBM Cell BE.Peer Reviewe

    Quantitative analysis of sequence alignment applications on multiprocessor architectures

    No full text
    The exponential growth of databases that contains biological information (such as protein and DNA data) demands great efforts to improve the performance of computational platforms. In this work we investigate how bioinformatics applications benefit from parallel architectures that combine different alternatives to exploit coarseand fine-grain parallelism. As a case of analysis we study the performance behavior of the search application that implements the Smith-Waterman algorithm, which is a dynamic programing approach that explores the similarity between a pair of sequences. The inherent large parallelism of the algorithm makes it ideal for architectures supporting multiple dimensions of parallelism (TLP, DLP and ILP). We study how this algorithm can take advantage of different parallel machines like the SGI Altix, IBM Power6, Cell BE machines and MareNostrum. Our results show that a share memory architecture like the PowerPC 970MP of Marenostrum can surpass a heterogeneous machine like the current Cell BE. Our quantitative analysis includes not only a study of scalability of the performance in terms of speedup, but also includes the analysis of bottlenecks in the execution of the application. This analysis is carried out through the study of the execution phases that the application presents.Peer Reviewe

    Exploiting different levels of parallelism in the biological sequence comparison problem

    No full text
    In the last years the fast growth of bioinformatics field has atracted the attention of computer scientists. At the same time, de exponential growth of databases that contains biological information (such as protein and DNA data) demands great efforts to improve the performance of computational platforms. In this work, we investigate how bioinformatics applications benefit from parallel architectures that combine different alternatives to exploit coarse- and fine-grain parallelism. As a case of analysis, we study the performance behavior of the Ssearch application that implements the Smith-Waterman algorithm (SW), which is a dynamic programing approach that explores the similarity between a pair of sequences. The inherent large parallelism of the application makes it ideal for architectures supporting multiple dimensions of parallelism (thread-level parallelism, TLP; data-level parallelism, DLP; instruction-level parallelism, ILP). We study how this algorithm can take advantage of different parallel machines like the SGI Altix, IBM Power6, IBM Cell BE and MareNostrum machines. Our study includes a qualitative analysis of the parallelization opportunities and also the quantification of the performance in terms of speedup and execution time. These measures are collected taking into account the specific characteristics of each architecture. As an example, our results show that a share memory multiprocessor architecture (SMP) like the PowerPC 970MP of Marenostrum machine can surpasses a heterogeneous multi- processor machine like the current IBM Cell BE.Peer Reviewe

    Long DNA sequence comparison on multicore architectures

    No full text
    Biological sequence comparison is one of the most important tasks in Bioinformatics. Due to the growth of biological databases, sequence comparison is becoming an important challenge for high performance computing, especially when very long sequences are compared. The Smith-Waterman (SW) algorithm is an exact method based on dynamic programming to quantify local similarity between sequences. The inherent large parallelism of the algorithm makes it ideal for architectures supporting multiple dimensions of parallelism (TLP, DLP and ILP). In this work, we show how long sequences comparison takes advantage of current and future multicore architectures. We analyze two different SW implementations on the CellBE and use simulation tools to study the performance scalability in a multicore architecture. We study the memory organization that delivers the maximum bandwidth with the minimum cost. Our results show that a heterogeneous architecture is an valid alternative to execute challenging bioinformatic workloads.Peer Reviewe
    corecore